feat(pt_expt): add dp finetune support by wanghan-iapcm · Pull Request #5331 · deepmodeling/deepmd-kit

wanghan-iapcm · 2026-03-20T16:20:16Z

Summary

Add --finetune, --model-branch, and --use-pretrain-script support to dp --pt-expt train, mirroring the pt backend's finetune flow (load pretrained checkpoint, change type map, selective weight copy, output bias adjustment)
Support finetuning from both .pt checkpoints and frozen .pte models (embed model_params in .pte during freeze for --use-pretrain-script)
Fix a bug in dpmodel's base_atomic_model.change_type_map where out_bias/out_std were not extended before remapping when the new type map introduces unseen types, causing IndexError with negative remap indices

Usage examples

# Finetune from a .pt checkpoint
dp --pt-expt train input.json --finetune pretrained.pt

# Finetune from a frozen .pte model
dp --pt-expt train input.json --finetune pretrained.pte

# Copy descriptor/fitting config from pretrained model
dp --pt-expt train input.json --finetune pretrained.pt --use-pretrain-script

# Finetune from a multi-task pretrained model (select a branch)
dp --pt-expt train input.json --finetune pretrained.pt --model-branch Default

# Re-initialize fitting net randomly (only keep descriptor weights)
dp --pt-expt train input.json --finetune pretrained.pt --model-branch RANDOM

Files changed

File	Change
`deepmd/pt_expt/utils/finetune.py`	New — `get_finetune_rules()` for pt_expt, supports `.pt` and `.pte`
`deepmd/pt_expt/entrypoints/main.py`	Wire `--finetune`/`--model-branch`/`--use-pretrain-script` through `train()` → `get_trainer()` → `Trainer`; pass `model_params` to `.pte` during freeze
`deepmd/pt_expt/train/training.py`	Finetune weight loading in `Trainer.__init__` (`.pt` and `.pte`); `model_change_out_bias()`
`deepmd/pt_expt/utils/serialization.py`	Embed/extract `model_params.json` in `.pte` archive
`deepmd/dpmodel/atomic_model/base_atomic_model.py`	Fix `change_type_map` to extend `out_bias`/`out_std` for new types (array-api compatible)
`source/tests/pt_expt/test_finetune.py`	New — 9 tests covering bias adjustment, type map change, CLI dispatch, `.pte` finetune, `--use-pretrain-script`, `random_fitting`, inherited weight consistency
`source/tests/consistent/model/test_ener.py`	Add `test_change_type_map_new_type` verifying `out_bias`/`out_std` extension across dp, pt, pt_expt

Test plan

python -m pytest source/tests/pt_expt/test_finetune.py -v (9 passed)
python -m pytest source/tests/pt_expt/test_training.py -v (11 passed, no regression)
python -m pytest source/tests/consistent/model/test_ener.py -k change_type_map -v (3 passed)
python -m pytest source/tests/consistent/descriptor/test_se_e2_a.py -v (351 passed, no regression)

Summary by CodeRabbit

New Features
- Fine-tuning workflow: supply pretrained checkpoints, select branch, and toggle pretrain-script behavior
- Automatic expansion of atom type maps (new types get zero bias and unit std) while preserving existing mappings
- Improved finetune resume: selective merging of pretrained descriptor/fitting weights and bias-adjustment modes
- Export/import embeds/restores model metadata to/from artifacts
Tests
- Unit and end-to-end tests for finetuning, bias adjustment, type-map expansion, and frozen-artifact scenarios

Add `--finetune`, `--model-branch`, and `--use-pretrain-script` support to `dp --pt-expt train`. The implementation mirrors the pt backend's finetune flow: load pretrained checkpoint, optionally change type map, selectively copy weights (descriptor always from pretrained, fitting conditionally), and adjust output bias. Also fix a bug in dpmodel's base_atomic_model.change_type_map where out_bias/out_std were not extended before remapping when the new type map introduces unseen types, causing an IndexError with negative remap indices.

Extend the finetune flow to accept .pte frozen models as the pretrained source, in addition to .pt checkpoints. The .pte file is loaded via serialize_from_file + BaseModel.deserialize to reconstruct the pretrained model with weights. Embed model_params in the .pte archive during freeze so that --use-pretrain-script works with .pte sources. Older .pte files without embedded model_params fall back to a minimal dict with just type_map. Add weight consistency checks to CLI tests (lr=1e-30 to prevent training from modifying weights) verifying descriptor and fitting weights match the pretrained model after finetune initialization.

The DPA1 test_finetune_change_type bias-adjusted comparison failed because the two trainers (with different type maps) sampled different data frames for bias adjustment. The data set has 80 frames but data_stat_nbatch=1 sampled only 1 frame, and the frame selection depended on numpy RNG state which differed between the two trainers. Fix by subsampling the data to 2 frames in TestEnergyModelDPA1 and using batch_size=2 so all frames are consumed deterministically.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c1be2ec5ef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

coderabbitai · 2026-03-20T16:37:23Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 19a078e1-8268-49a2-9075-1c7f23226a7f

📥 Commits

Reviewing files that changed from the base of the PR and between 7c75b73 and 20f259c.

📒 Files selected for processing (3)

deepmd/pt_expt/entrypoints/main.py
deepmd/pt_expt/train/training.py
deepmd/pt_expt/utils/serialization.py

🚧 Files skipped from review as they are similar to previous changes (2)

deepmd/pt_expt/utils/serialization.py
deepmd/pt_expt/entrypoints/main.py

📝 Walkthrough

Walkthrough

Adds pt_expt fine-tuning flow: expands type-map remap to grow per-type out_bias/out_std for new atom types, embeds/extracts model_params in frozen artifacts, introduces finetune rule extraction and Trainer logic for selective pretrained weight transfer and bias adjustment, and adds unit and e2e tests.

Changes

Cohort / File(s)	Summary
Type Map Handling `deepmd/dpmodel/atomic_model/base_atomic_model.py`	`change_type_map` now grows per-type `out_bias` (zeros) and `out_std` (ones) when new atom types are introduced before performing the existing remap.
Fine-tuning Utilities `deepmd/pt_expt/utils/finetune.py`	New module providing `get_finetune_rules()` and helpers to detect frozen `.pte`/`.pt2` vs `.pt`, extract `model_params`, and build/validate finetune routing (including descriptor checks and branch resolution).
Serialization Support `deepmd/pt_expt/utils/serialization.py`	`serialize_from_file()` now returns embedded `"model_params"` when present; `deserialize_to_file()` gained optional `model_params` parameter to embed `"model_params.json"` into `.pte` exports.
CLI & Trainer Integration `deepmd/pt_expt/entrypoints/main.py`, `deepmd/pt_expt/train/training.py`	Added CLI flags (`--finetune`, `--model_branch`, `--use_pretrain_script`), `get_trainer`/`Trainer.__init__` accept `finetune_model`/`finetune_links`; resume rules for `.pte`/`.pt2` vs `.pt` implemented; selective pretrained weight transfer (descriptor vs fitting vs _extra_state), type-map growth handling, and `model_change_out_bias()` helper added.
Tests — unit & e2e `source/tests/consistent/model/test_ener.py`, `source/tests/pt_expt/test_finetune.py`	Adds unit test for `change_type_map` with new atom types and comprehensive model-level and CLI end-to-end finetune tests covering bias adjustment, type remap consistency, frozen `.pte` sources, `use_pretrain_script`, and `RANDOM` fitting behavior.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as CLI (entrypoint)
    participant Config as Config
    participant Finetune as FinetuneRules
    participant Serializer as Serializer
    participant Trainer as Trainer
    participant Model as Model

    User->>CLI: dp --pt-expt train --finetune model.pte --use-pretrain-script
    CLI->>Config: load config & init model
    CLI->>Finetune: get_finetune_rules(model.pte, model_config, model_branch)
    Finetune->>Serializer: serialize_from_file(.pte/.pt)
    Serializer-->>Finetune: model data + model_params
    Finetune-->>CLI: finetune_links
    CLI->>Trainer: Trainer(finetune_model, finetune_links)
    Trainer->>Serializer: deserialize pretrained (.pte/.pt)
    Trainer->>Trainer: determine resume vs finetune rules
    Trainer->>Model: selective weight transfer (descriptor / fitting / _extra_state)
    Trainer->>Model: check finetune_links.Default.get_has_new_type()
    alt new types present
        Trainer->>Model: change_type_map(new_type_map) -> expand out_bias/out_std + remap
    end
    Trainer->>Model: model_change_out_bias(sample_func, mode)
    Trainer->>Model: start finetune training
    Trainer->>Serializer: deserialize_to_file(output.pte, data, model_params)
    Serializer-->>User: saved checkpoint (.pte) with embedded model_params

sequenceDiagram
    participant Trainer
    participant Pretrained as PretrainedCheckpoint
    participant Target as TargetModel
    participant Rule as FinetuneRule

    Trainer->>Pretrained: load weights (.pte/.pt)
    Trainer->>Rule: get_has_new_type()
    alt New Types Detected
        Trainer->>Target: change_type_map(new_type_map)
        Target->>Target: expand out_bias/out_std then remap
    end
    Trainer->>Target: copy descriptor weights from pretrained
    Trainer->>Rule: get_random_fitting()
    alt Keep Random Fitting
        Note over Target: keep random init for fitting params
    else Use Pretrained Fitting
        Trainer->>Target: copy fitting weights from pretrained
    end
    Trainer->>Target: change_out_bias(mode)
    Target->>Target: adjust bias via statistics

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(pt): consistent fine-tuning with init-model #3803: Overlaps on fine-tuning/type-map handling, serialization of model_params, and pretrain-script finetune flow.
fix(pt): finetuning property/dipole/polar/dos fitting with multi-dimensional data causes error #4145: Related selective finetune weight transfer and handling of fitting-related parameters during finetune.
feat(pt): support zbl finetune #4849: Related changes to finetuning/resume logic and parameter remapping when introducing new atom types.

Suggested reviewers

iProzd
njzjz

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(pt_expt): add dp finetune support' clearly and concisely summarizes the main objective—adding finetune functionality to the pt_expt training pipeline—which is the primary focus across all modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/pt_expt/train/training.py`:
- Around line 384-385: The current logic sets resume_model = init_model or
restart_model or finetune_model so finetune can incorrectly pick init/restart
checkpoints; change this by (a) validating inputs up front in the function that
defines init_model/restart_model/finetune_model and raise an error if more than
one of those is set, OR (b) keep the existing resume_model variable but in the
finetune branch explicitly load weights from finetune_model (not resume_model)
and use that checkpoint to populate descriptor/fitting weights and
_extra_state["model_params"]; update both the initial resume/resuming block
(resume_model/resuming) and the finetune-specific code region (~lines 487-527)
to follow the chosen approach so finetune never inherits init/restart weights.
- Around line 991-1002: The log attempts to convert CUDA tensors returned by
_model.get_out_bias() to numpy via np.asarray which raises RuntimeError on CUDA;
replace the np.asarray(...) calls with to_numpy_array(...) from
deepmd.dpmodel.common when building the log message after calling
_model.change_out_bias (and similarly anywhere else you call np.asarray on
_model.get_out_bias()), so call to_numpy_array(old_bias).reshape(-1) and
to_numpy_array(new_bias).reshape(-1) (slicing by len(model_type_map) as before)
to ensure device-safe conversion for logging.

In `@deepmd/pt_expt/utils/finetune.py`:
- Around line 35-40: The code currently falls back to returning only
{"type_map": ...} when serialize_from_file(finetune_model) lacks "model_params",
which silently allows change_model_params=True to proceed with incomplete
config; modify the logic in the finetune model-loading blocks (where
serialize_from_file, finetune_model is used and again in the block around lines
79-92) to detect when change_model_params is True and "model_params" is missing,
and immediately raise a clear error (including mention of using
--use-pretrain-script or that legacy .pte lacks model_params.json) instead of
returning the minimal dict; ensure the error path prevents calling
get_finetune_rule_single with incomplete input.

In `@source/tests/consistent/model/test_ener.py`:
- Around line 1333-1423: The test wrongly sets dp_std_orig =
to_numpy_array(dp_model.get_out_bias()) instead of snapshotting the original
out_std and then never asserts remapping for old types; change the snapshot to
dp_std_orig = to_numpy_array(dp_model.atomic_model.out_std) (and similarly
ensure any other std snapshots use atomic_model.out_std), seed a non-trivial
out_std on dp_model before change_type_map, then add assertions that the
remapped old entries land at indices 3 and 0 (compare dp_std_new[:, 3, :] to
dp_std_orig[:, 0, :] for "O" and dp_std_new[:, 0, :] to dp_std_orig[:, 1, :] for
"H"), keep cross-backend equality checks (pt_model, pt_expt_model) and remove or
use any now-unused locals; run ruff check . and ruff format . before committing.

In `@source/tests/pt_expt/test_finetune.py`:
- Around line 564-580: The loop comparing ft_state and pre_state must, when
random_fitting is True, assert that fitting tensors are not all identical:
locate the loop over ft_state and the variables ft_state, pre_state and
random_fitting; gather keys containing ".fitting" present in both ft_state and
pre_state and assert that at least one of those tensors differs (e.g., by
checking torch.any(ft_state[k] != pre_state[k]) for at least one k), failing the
test if all fitting tensors are equal.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: bb57259a-acc4-47d2-bb02-4dc75c3365c5

📥 Commits

Reviewing files that changed from the base of the PR and between 91e3d62 and c1be2ec.

📒 Files selected for processing (7)

deepmd/dpmodel/atomic_model/base_atomic_model.py
deepmd/pt_expt/entrypoints/main.py
deepmd/pt_expt/train/training.py
deepmd/pt_expt/utils/finetune.py
deepmd/pt_expt/utils/serialization.py
source/tests/consistent/model/test_ener.py
source/tests/pt_expt/test_finetune.py

Older .pte files (or those produced by external code calling deserialize_to_file without model_params) lack the embedded model_params.json. When --use-pretrain-script is used with such files, get_finetune_rule_single would crash with a KeyError on "descriptor". Add an explicit check with a clear error message.

codecov · 2026-03-20T17:24:10Z

Codecov Report

❌ Patch coverage is 90.56604% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.42%. Comparing base (6122d97) to head (20f259c).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
deepmd/pt_expt/entrypoints/main.py	69.23%	4 Missing ⚠️
deepmd/pt_expt/train/training.py	94.23%	3 Missing ⚠️
deepmd/pt_expt/utils/finetune.py	89.28%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5331      +/-   ##
==========================================
+ Coverage   82.40%   82.42%   +0.02%     
==========================================
  Files         783      784       +1     
  Lines       79031    79124      +93     
  Branches     3675     3675              
==========================================
+ Hits        65122    65219      +97     
+ Misses      12736    12731       -5     
- Partials     1173     1174       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Reject combining finetune_model with init_model/restart_model - Use to_numpy_array instead of np.asarray in model_change_out_bias for CUDA tensor safety - Remove unused variables dp_std_orig/dp_std_before in test_ener.py - Add out_std remap correctness assertion for old types - Assert fitting weights differ (not just skip) for random_fitting=True, excluding bias_atom_e which is set by bias adjustment

coderabbitai

🧹 Nitpick comments (1)

source/tests/pt_expt/test_finetune.py (1)

122-132: Minor: Redundant import.

shutil is already imported at line 16; the local import shutil as _shutil on line 124 is unnecessary.

Suggested fix

 def _subsample_data(src_dir: str, dst_dir: str, nframes: int = 2) -> None:
     """Copy a data system, keeping only the first *nframes* frames."""
-    import shutil as _shutil
-
-    _shutil.copytree(src_dir, dst_dir, dirs_exist_ok=True)
+    shutil.copytree(src_dir, dst_dir, dirs_exist_ok=True)
     set_dir = os.path.join(dst_dir, "set.000")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@source/tests/pt_expt/test_finetune.py` around lines 122 - 132, The helper
_subsample_data currently does a redundant local import "import shutil as
_shutil"; remove that line and use the module already imported at the top-level
(shutil) when calling copytree in _subsample_data so the function uses
shutil.copytree(dst_dir, ...) instead of the locally imported _shutil; update
references in _subsample_data to shutil to avoid the duplicate import.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@source/tests/pt_expt/test_finetune.py`:
- Around line 122-132: The helper _subsample_data currently does a redundant
local import "import shutil as _shutil"; remove that line and use the module
already imported at the top-level (shutil) when calling copytree in
_subsample_data so the function uses shutil.copytree(dst_dir, ...) instead of
the locally imported _shutil; update references in _subsample_data to shutil to
avoid the duplicate import.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9816e5c4-2f84-4307-b1ec-86b1aa95e00b

📥 Commits

Reviewing files that changed from the base of the PR and between 603fcd9 and cb6ab5f.

📒 Files selected for processing (3)

deepmd/pt_expt/train/training.py
source/tests/consistent/model/test_ener.py
source/tests/pt_expt/test_finetune.py

✅ Files skipped from review due to trivial changes (1)

source/tests/consistent/model/test_ener.py

…in finetune tests Replace np.asarray() with to_numpy_array() when converting model bias tensors to numpy arrays. np.asarray() fails on CUDA tensors with "can't convert cuda:0 device type tensor to numpy", while to_numpy_array() handles device transfer automatically.

# Conflicts: # deepmd/pt_expt/utils/serialization.py

Han Wang added 3 commits March 20, 2026 23:04

github-actions Bot added the Python label Mar 20, 2026

wanghan-iapcm requested a review from iProzd March 20, 2026 16:21

dosubot Bot added the new feature label Mar 20, 2026

github-advanced-security AI found potential problems Mar 20, 2026

View reviewed changes

Comment thread source/tests/consistent/model/test_ener.py Fixed

Comment thread source/tests/consistent/model/test_ener.py Fixed

chatgpt-codex-connector Bot reviewed Mar 20, 2026

View reviewed changes

Comment thread deepmd/pt_expt/utils/finetune.py

coderabbitai Bot reviewed Mar 20, 2026

View reviewed changes

Comment thread deepmd/pt_expt/train/training.py

Comment thread deepmd/pt_expt/train/training.py

Comment thread deepmd/pt_expt/utils/finetune.py

Comment thread source/tests/consistent/model/test_ener.py Outdated

Comment thread source/tests/pt_expt/test_finetune.py Outdated

wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Mar 21, 2026

github-actions Bot removed the Test CUDA Trigger test CUDA workflow label Mar 21, 2026

coderabbitai Bot reviewed Mar 21, 2026

View reviewed changes

wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Mar 22, 2026

github-actions Bot removed the Test CUDA Trigger test CUDA workflow label Mar 22, 2026

iProzd approved these changes Mar 23, 2026

View reviewed changes

iProzd added this pull request to the merge queue Mar 23, 2026

github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch Mar 23, 2026

Merge remote-tracking branch 'upstream/master' into feat-pt-expt-ft

20f259c

# Conflicts: # deepmd/pt_expt/utils/serialization.py

wanghan-iapcm added this pull request to the merge queue Mar 23, 2026

Merged via the queue into deepmodeling:master with commit 034e613 Mar 23, 2026
70 checks passed

wanghan-iapcm deleted the feat-pt-expt-ft branch March 23, 2026 15:30

This was referenced Mar 26, 2026

feat(c++,pt-expt): add DeepPotModelDevi C++ tests for .pt2 backend #5342

Merged

refactor(pt_expt): use model API for inference, consistent file naming #5354

Merged

coderabbitai Bot mentioned this pull request Apr 2, 2026

Sog devel #5361

Closed

Conversation

wanghan-iapcm commented Mar 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage examples

Files changed

Test plan

Summary by CodeRabbit

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

coderabbitai Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wanghan-iapcm commented Mar 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 20, 2026 •

edited

Loading

codecov Bot commented Mar 20, 2026 •

edited

Loading